Last data update: Apr 22, 2024. (Total: 46599 publications since 2009)
Records 1-30 (of 30 Records) |
Query Trace: Skums P[original query] |
---|
Network analysis of the chronic Hepatitis C virome defines HVR1 evolutionary phenotypes in the context of humoral immune responses.
Palmer BA , Schmidt-Martin D , Dimitrova Z , Skums P , Crosbie O , Kenny-Walsh E , Fanning LJ . J Virol 2015 90 (7) 3318-29 Hypervariable region 1 (HVR1) of hepatitis C virus (HCV) comprises the first 27 N-terminal amino acid residues of E2. It is classically seen as the most heterogeneous region of the HCV genome. In this study, we assessed HVR1 evolution by using ultradeep pyrosequencing for a cohort of treatment-naive, chronically infected patients over a short, 16-week period. Organization of the sequence set into connected components that represented single nucleotide substitution events revealed a network dominated by highly connected, centrally positioned master sequences. HVR1 phenotypes were observed to be under strong purifying (stationary) and strong positive (antigenic drift) selection pressures, which were coincident with advancing patient age and cirrhosis of the liver. It followed that stationary viromes were dominated by a single HVR1 variant surrounded by minor variants comprised from conservative single amino acid substitution events. We present evidence to suggest that neutralization antibody efficacy was diminished for stationary-virome HVR1 variants. Our results identify the HVR1 network structure during chronic infection as the preferential dominance of a single variant within a narrow sequence space. IMPORTANCE: HCV infection is often asymptomatic, and chronic infection is generally well established in advance of initial diagnosis and subsequent treatment. HVR1 can undergo rapid sequence evolution during acute infection, and the variant pool is typically seen to diverge away from ancestral sequences as infection progresses from the acute to the chronic phase. In this report, we describe HVR1 viromes in chronically infected patients that are defined by a dominant epitope located centrally within a narrow variant pool. Our findings suggest that weakened humoral immune activity, as a consequence of persistent chronic infection, allows for the acquisition and maintenance of host-specific adaptive mutations at HVR1 that reflect virus fitness. |
Evaluation of viral heterogeneity using next-generation sequencing, end-point limiting-dilution and mass spectrometry.
Dimitrova Z , Campo DS , Ramachandran S , Vaughan G , Ganova-Raeva L , Lin Y , Forbi JC , Xia G , Skums P , Pearlman B , Khudyakov Y . In Silico Biol 2011 11 183-92 Hepatitis C Virus sequence studies mainly focus on the viral amplicon containing the Hypervariable region 1 (HVR1) to obtain a sample of sequences from which several population genetics parameters can be calculated. Recent advances in sequencing methods allow for analyzing an unprecedented number of viral variants from infected patients and present a novel opportunity for understanding viral evolution, drug resistance and immune escape. In the present paper, we compared three recent technologies for amplicon analysis: (i) Next-Generation Sequencing; (ii) Clonal sequencing using End-point Limiting-dilution for isolation of individual sequence variants followed by Real-Time PCR and sequencing; and (iii) Mass spectrometry of base-specific cleavage reactions of a target sequence. These three technologies were used to assess intra-host diversity and inter-host genetic relatedness in HVR1 amplicons obtained from 38 patients (subgenotypes 1a and 1b). Assessments of intra-host diversity varied greatly between sequence-based and mass-spectrometry-based data. However, assessments of inter-host variability by all three technologies were equally accurate in identification of genetic relatedness among viral strains. These results support the application of all three technologies for molecular epidemiology and population genetics studies. Mass spectrometry is especially promising given its high throughput, low cost and comparable results with sequence-based methods. |
CliqueSNV: Scalable Reconstruction of Intra-Host Viral Populations from NGS Reads (preprint)
Knyazev S , Tsyvina V , Melnyk A , Artyomenko A , Malygina T , Porozov YB , Campbell EM , Switzer WM , Skums P , Zelikovsky A . bioRxiv 2019 264242 Highly mutable RNA viruses such as influenza A virus, human immunodeficiency virus and hepatitis C virus exist in infected hosts as highly heterogeneous populations of closely related genomic variants. The presence of low-frequency variants with few mutations with respect to major strains may result in an immune escape, emergence of drug resistance, and an increase of virulence and infectivity. Next-generation sequencing technologies permit detection of sample intra-host viral population at extremely great depth, thus providing an opportunity to access low-frequency variants. Long read lengths offered by single-molecule sequencing technologies allow all viral variants to be sequenced in a single pass. However, high sequencing error rates limit the ability to study heterogeneous viral populations composed of rare, closely related variants.In this article, we present CliqueSNV, a novel reference-based method for reconstruction of viral variants from NGS data. It efficiently constructs an allele graph based on linkage between single nucleotide variations and identifies true viral variants by merging cliques of that graph using combinatorial optimization techniques. The new method outperforms existing methods in both accuracy and running time on experimental and simulated NGS data for titrated levels of known viral variants. For PacBio reads, it accurately reconstructs variants with frequency as low as 0.1%. For Illumina reads, it fully reconstructs main variants. The open source implementation of CliqueSNV is freely available for download at https://github.com/vyacheslav-tsivina/CliqueSNV |
Primary case inference in viral outbreaks through analysis of intra-host variant population (preprint)
Gussler JW , Campo DS , Dimitrova Z , Skums P , Khudyakov Y . bioRxiv 2020 2020.09.18.303131 Investigation of outbreaks to identify the primary case is crucial for the interruption and prevention of transmission of infectious diseases. These individuals may have a higher risk of participating in near future transmission events when compared to the other patients in the outbreak, so directing more transmission prevention resources towards these individuals is a priority. Genetic characterization of intra-host viral populations, although highly efficient in the identification of transmission clusters, is not as efficient in routing transmissions during outbreaks, owing to complexity of viral evolution. Here, we present a new computational framework, PYCIVO: primary case inference in viral outbreaks. This framework expands upon our earlier work in development of QUENTIN, which builds a probabilistic disease transmission tree based on simulation of evolution of intra-host hepatitis C virus (HCV) variants between cases involved in direct transmission during an outbreak. PYCIVO improves upon QUENTIN by also implementing a custom heterogeneity index which empowers PYCIVO to make the important ‘No primary case’ prediction. One or more samples, possibly including the primary case, may have not been sampled, and this designation is meant to account for these scenarios. These approaches were validated using a set of 105 sequence samples from 11 distinct HCV transmission clusters identified during outbreak investigations, in which the primary case was epidemiologically verified. Both models can detect the correct primary case in 9 out of 11 transmission clusters (81.8%). However, while QUENTIN issues erroneous predictions on the remaining 2 transmission clusters, PYCIVO issues a null output for these clusters, giving it an effective prediction accuracy of 100%. To further evaluate accuracy of the inference, we created 10 modified transmission clusters in which the primary case had been removed. In this scenario, PYCIVO was able to correctly identify that there was no primary case in 8/10 (80%) of these modified clusters. This model was validated with HCV; however, this approach may be applicable to other microbial pathogens.A version of this software is publicly available at the following url: https://www.github.com/walkergussler/PYCIVO |
Quantitative differences between intra-host HCV populations from persons with recently established and persistent infections (preprint)
Icer Baykal PB , Lara J , Khudyakov Y , Zelikovsky A , Skums P . bioRxiv 2020 2020.06.17.157792 Background Detection of incident hepatitis C virus (HCV) infections is crucial for identification of outbreaks and development of public health interventions. However, there is no single diagnostic assay for distinguishing recent and persistent HCV infections. HCV exists in each infected host as a heterogeneous population of genomic variants, whose evolutionary dynamics remain incompletely understood. Genetic analysis of such viral populations can be applied to the detection of incident HCV infections and used to understand intra-host viral evolution.Methods We studied intra-host HCV populations sampled using next-generation sequencing from 98 recently and 256 persistently infected individuals. Genetic structure of the populations was evaluated using 245,878 viral sequences from these individuals and a set of selected parameters measuring their diversity, topological structure, complexity, strength of selection, epistasis, evolutionary dynamics, and physico-chemical properties.Findings Distributions of the viral population parameters differ significantly between recent and persistent infections. A general increase in viral genetic diversity from recent to persistent infections is frequently accompanied by decline in genomic complexity and increase in structuredness of the HCV population, likely reflecting a high level of intra-host adaptation at later stages of infection. Using these findings, we developed a Machine Learning classifier for the infection staging, which yielded a detection accuracy of 95.22%, thus providing a higher accuracy than other genomic-based models.Interpretation The detection of a strong association between several HCV genetic factors and stages of infection suggests that intra-host HCV population develops in a complex but regular and predictable manner in the course of infection. The proposed models may serve as a foundation of cyber-molecular assays for staging infection, that could potentially complement and/or substitute standard laboratory assays.Funding AZ and PS were supported by NIH grant 1R01EB025022. PIB was supported by GSU MBD fellowship.Competing Interest StatementThe authors have declared no competing interest. |
Fast estimation of genetic relatedness between members of heterogeneous populations of closely related genomic variants (preprint)
Tsyvina V , Campo DS , Sims S , Zelikovsky A , Khudyakov Y , Skums P . bioRxiv 2018 324418 Many biological analysis tasks require extraction of families of genetically similar sequences from large datasets produced by Next-generation Sequencing (NGS). Such tasks include detection of viral transmissions by analysis of all genetically close pairs of sequences from viral datasets sampled from infected individuals or studying of evolution of viruses or immune repertoires by analysis of network of intra-host viral variants or antibody clonotypes formed by genetically close sequences. The most obvious naĻeve algorithms to extract such sequence families are impractical in light of the massive size of modern NGS datasets. In this paper, we present fast and scalable k-mer-based framework to perform such sequence similarity queries efficiently, which specifically targets data produced by deep sequencing of heterogeneous populations such as viruses. The tool is freely available for download at https://github.com/vyacheslav-tsivina/signature-sj |
SOPHIE: Viral Outbreak Investigation and Transmission History Reconstruction in a Joint Phylogenetic and Network Theory Framework
Skums Pavel , Mohebbi Fatemeh , Tsyvina Vyacheslav , Icer Pelin , Ramachandran Sumathi , Khudyakov Yury . Res Comput Mol Biol 2022 369-370 Reconstruction of transmission networks from viral genomes sampled from infected individuals is a major computational problem of genomic epidemiology. For this problem, we propose a maximum likelihood framework SOPHIE (SOcial and PHilogenetic Investigation of Epidemics) based on the integration of phylogenetic and random graph models. SOPHIE is scalable, accounts for intra-host diversity and accurately infers transmissions without case-specific epidemiological data. |
SOPHIE: viral outbreak investigation and transmission history reconstruction in a joint phylogenetic and network theory framework (preprint)
Skums P , Mohebbi F , Tsyvina V , Baykal PI , Nemira A , Ramachandran S , Khudyakov Y . bioRxiv 2022 05 (10) 844-856 e4 Genomic epidemiology is now widely used for viral outbreak investigations. Still, this methodology faces many challenges. First, few methods account for intra-host viral diversity. Second, maximum parsimony principle continues to be employed, even though maximum likelihood or Bayesian models are usually more consistent. Third, many methods utilize case-specific data, such as sampling times or infection exposure intervals. This impedes study of persistent infections in vulnerable groups, where such information has a limited use. Finally, most methods implicitly assume that transmission events are independent, while common source outbreaks violate this assumption. We propose a maximum likelihood framework SOPHIE (SOcial and PHilogenetic Investigation of Epidemics) based on integration of phylogenetic and random graph models. It infers transmission networks from viral phylogenies and expected properties of inter-host social networks modelled as random graphs with given expected degree distributions. SOPHIE is scalable, accounts for intra-host diversity and accurately infers transmissions without case-specific epidemiological data. SOPHIE code is freely available at https://github.com/compbel/SOPHIE/ Copyright The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license. |
SOPHIE: Viral outbreak investigation and transmission history reconstruction in a joint phylogenetic and network theory framework.
Skums P , Mohebbi F , Tsyvina V , Baykal PI , Nemira A , Ramachandran S , Khudyakov Y . Cell Syst 2022 13 (10) 844-856.e4 Genomic epidemiology is now widely used for viral outbreak investigations. Still, this methodology faces many challenges. First, few methods account for intra-host viral diversity. Second, maximum parsimony principle continues to be employed for phylogenetic inference of transmission histories, even though maximum likelihood or Bayesian models are usually more consistent. Third, many methods utilize case-specific data, such as sampling times or infection exposure intervals. This impedes study of persistent infections in vulnerable groups, where such information has a limited use. Finally, most methods implicitly assume that transmission events are independent, although common source outbreaks violate this assumption. We propose a maximum likelihood framework, SOPHIE, based on the integration of phylogenetic and random graph models. It infers transmission networks from viral phylogenies and expected properties of inter-host social networks modeled as random graphs with given expected degree distributions. SOPHIE is scalable, accounts for intra-host diversity, and accurately infers transmissions without case-specific epidemiological data. |
Primary case inference in viral outbreaks through analysis of intra-host variant population.
Gussler JW , Campo DS , Dimitrova Z , Skums P , Khudyakov Y . BMC Bioinformatics 2022 23 (1) 62 BACKGROUND: Investigation of outbreaks to identify the primary case is crucial for the interruption and prevention of transmission of infectious diseases. These individuals may have a higher risk of participating in near future transmission events when compared to the other patients in the outbreak, so directing more transmission prevention resources towards these individuals is a priority. Although the genetic characterization of intra-host viral populations can aid the identification of transmission clusters, it is not trivial to determine the directionality of transmissions during outbreaks, owing to complexity of viral evolution. Here, we present a new computational framework, PYCIVO: primary case inference in viral outbreaks. This framework expands upon our earlier work in development of QUENTIN, which builds a probabilistic disease transmission tree based on simulation of evolution of intra-host hepatitis C virus (HCV) variants between cases involved in direct transmission during an outbreak. PYCIVO improves upon QUENTIN by also adding a custom heterogeneity index and identifying the scenario when the primary case may have not been sampled. RESULTS: These approaches were validated using a set of 105 sequence samples from 11 distinct HCV transmission clusters identified during outbreak investigations, in which the primary case was epidemiologically verified. Both models can detect the correct primary case in 9 out of 11 transmission clusters (81.8%). However, while QUENTIN issues erroneous predictions on the remaining 2 transmission clusters, PYCIVO issues a null output for these clusters, giving it an effective prediction accuracy of 100%. To further evaluate accuracy of the inference, we created 10 modified transmission clusters in which the primary case had been removed. In this scenario, PYCIVO was able to correctly identify that there was no primary case in 8/10 (80%) of these modified clusters. This model was validated with HCV; however, this approach may be applicable to other microbial pathogens. CONCLUSIONS: PYCIVO improves upon QUENTIN by also implementing a custom heterogeneity index which empowers PYCIVO to make the important 'No primary case' prediction. One or more samples, possibly including the primary case, may have not been sampled, and this designation is meant to account for these scenarios. |
Accurate assembly of minority viral haplotypes from next-generation sequencing through efficient noise reduction.
Knyazev S , Tsyvina V , Shankar A , Melnyk A , Artyomenko A , Malygina T , Porozov YB , Campbell EM , Switzer WM , Skums P , Mangul S , Zelikovsky A . Nucleic Acids Res 2021 49 (17) e102 Rapidly evolving RNA viruses continuously produce minority haplotypes that can become dominant if they are drug-resistant or can better evade the immune system. Therefore, early detection and identification of minority viral haplotypes may help to promptly adjust the patient's treatment plan preventing potential disease complications. Minority haplotypes can be identified using next-generation sequencing, but sequencing noise hinders accurate identification. The elimination of sequencing noise is a non-trivial task that still remains open. Here we propose CliqueSNV based on extracting pairs of statistically linked mutations from noisy reads. This effectively reduces sequencing noise and enables identifying minority haplotypes with the frequency below the sequencing error rate. We comparatively assess the performance of CliqueSNV using an in vitro mixture of nine haplotypes that were derived from the mutation profile of an existing HIV patient. We show that CliqueSNV can accurately assemble viral haplotypes with frequencies as low as 0.1% and maintains consistent performance across short and long bases sequencing platforms. |
Quantitative differences between intra-host HCV populations from persons with recently established and persistent infections.
Icer Baykal PB , Lara J , Khudyakov Y , Zelikovsky A , Skums P . Virus Evol 2021 7 (1) veaa103 Detection of incident hepatitis C virus (HCV) infections is crucial for identification of outbreaks and development of public health interventions. However, there is no single diagnostic assay for distinguishing recent and persistent HCV infections. HCV exists in each infected host as a heterogeneous population of genomic variants, whose evolutionary dynamics remain incompletely understood. Genetic analysis of such viral populations can be applied to the detection of incident HCV infections and used to understand intra-host viral evolution. We studied intra-host HCV populations sampled using next-generation sequencing from 98 recently and 256 persistently infected individuals. Genetic structure of the populations was evaluated using 245,878 viral sequences from these individuals and a set of selected features measuring their diversity, topological structure, complexity, strength of selection, epistasis, evolutionary dynamics, and physico-chemical properties. Distributions of the viral population features differ significantly between recent and persistent infections. A general increase in viral genetic diversity from recent to persistent infections is frequently accompanied by decline in genomic complexity and increase in structuredness of the HCV population, likely reflecting a high level of intra-host adaptation at later stages of infection. Using these findings, we developed a machine learning classifier for the infection staging, which yielded a detection accuracy of 95.22 per cent, thus providing a higher accuracy than other genomic-based models. The detection of a strong association between several HCV genetic factors and stages of infection suggests that intra-host HCV population develops in a complex but regular and predictable manner in the course of infection. The proposed models may serve as a foundation of cyber-molecular assays for staging infection, which could potentially complement and/or substitute standard laboratory assays. |
Accurate spatiotemporal mapping of drug overdose deaths by machine learning of drug-related web-searches.
Campo DS , Gussler JW , Sue A , Skums P , Khudyakov Y . PLoS One 2020 15 (12) e0243622 Persons who inject drugs (PWID) are at increased risk for overdose death (ODD), infections with HIV, hepatitis B (HBV) and hepatitis C virus (HCV), and noninfectious health conditions. Spatiotemporal identification of PWID communities is essential for developing efficient and cost-effective public health interventions for reducing morbidity and mortality associated with injection-drug use (IDU). Reported ODDs are a strong indicator of the extent of IDU in different geographic regions. However, ODD quantification can take time, with delays in ODD reporting occurring due to a range of factors including death investigation and drug testing. This delayed ODD reporting may affect efficient early interventions for infectious diseases. We present a novel model, Dynamic Overdose Vulnerability Estimator (DOVE), for assessment and spatiotemporal mapping of ODDs in different U.S. jurisdictions. Using Google® Web-search volumes (i.e., the fraction of all searches that include certain words), we identified a strong association between the reported ODD rates and drug-related search terms for 2004-2017. A machine learning model (Extremely Random Forest) was developed to produce yearly ODD estimates at state and county levels, as well as monthly estimates at state level. Regarding the total number of ODDs per year, DOVE's error was only 3.52% (Median Absolute Error, MAE) in the United States for 2005-2017. DOVE estimated 66,463 ODDs out of the reported 70,237 (94.48%) during 2017. For that year, the MAE of the individual ODD rates was 4.43%, 7.34%, and 12.75% among yearly estimates for states, yearly estimates for counties, and monthly estimates for states, respectively. These results indicate suitability of the DOVE ODD estimates for dynamic IDU assessment in most states, which may alert for possible increased morbidity and mortality associated with IDU. ODD estimates produced by DOVE offer an opportunity for a spatiotemporal ODD mapping. Timely identification of potential mortality trends among PWID might assist in developing efficient ODD prevention and HBV, HCV, and HIV infection elimination programs by targeting public health interventions to the most vulnerable PWID communities. |
Fast estimation of genetic relatedness between members of heterogeneous populations of closely related genomic variants.
Tsyvina V , Campo DS , Sims S , Zelikovsky A , Khudyakov Y , Skums P . BMC Bioinformatics 2018 19 360 BACKGROUND: Many biological analysis tasks require extraction of families of genetically similar sequences from large datasets produced by Next-generation Sequencing (NGS). Such tasks include detection of viral transmissions by analysis of all genetically close pairs of sequences from viral datasets sampled from infected individuals or studying of evolution of viruses or immune repertoires by analysis of network of intra-host viral variants or antibody clonotypes formed by genetically close sequences. The most obvious naieve algorithms to extract such sequence families are impractical in light of the massive size of modern NGS datasets. RESULTS: In this paper, we present fast and scalable k-mer-based framework to perform such sequence similarity queries efficiently, which specifically targets data produced by deep sequencing of heterogeneous populations such as viruses. It shows better filtering quality and time performance when comparing to other tools. The tool is freely available for download at https://github.com/vyacheslav-tsivina/signature-sj CONCLUSION: The proposed tool allows for efficient detection of genetic relatedness between genomic samples produced by deep sequencing of heterogeneous populations. It should be especially useful for analysis of relatedness of genomes of viruses with unevenly distributed variable genomic regions, such as HIV and HCV. For the future we envision, that besides applications in molecular epidemiology the tool can also be adapted to immunosequencing and metagenomics data. |
QUENTIN: reconstruction of disease transmissions from viral quasispecies genomic data.
Skums P , Zelikovsky A , Singh R , Gussler W , Dimitrova Z , Knyazev S , Mandric I , Ramachandran S , Campo D , Jha D , Bunimovich L , Costenbader E , Sexton C , O'Connor S , Xia GL , Khudyakov Y . Bioinformatics 2018 34 (1) 163-170 Motivation: Genomic analysis has become one of the major tools for disease outbreak investigations. However, existing computational frameworks for inference of transmission history from viral genomic data often do not consider intra-host diversity of pathogens and heavily rely on additional epidemiological data, such as sampling times and exposure intervals. This impedes genomic analysis of outbreaks of highly mutable viruses associated with chronic infections, such as human immunodeficiency virus and hepatitis C virus, whose transmissions are often carried out through minor intra-host variants, while the additional epidemiological information often is either unavailable or has a limited use. Results: The proposed framework QUasispecies Evolution, Network-based Transmission INference (QUENTIN) addresses the above challenges by evolutionary analysis of intra-host viral populations sampled by deep sequencing and Bayesian inference using general properties of social networks relevant to infection dissemination. This method allows inference of transmission direction even without the supporting case-specific epidemiological information, identify transmission clusters and reconstruct transmission history. QUENTIN was validated on experimental and simulated data, and applied to investigate HCV transmission within a community of hosts with high-risk behavior. It is available at https://github.com/skumsp/QUENTIN. Contact: pskums@gsu.edu or alexz@cs.gsu.edu or rahul@sfsu.edu or yek0@cdc.gov. Supplementary information: Supplementary data are available at Bioinformatics online. |
Inference of genetic relatedness between viral quasispecies from sequencing data.
Glebova O , Knyazev S , Melnyk A , Artyomenko A , Khudyakov Y , Zelikovsky A , Skums P . BMC Genomics 2017 18 918 BACKGROUND: RNA viruses such as HCV and HIV mutate at extremely high rates, and as a result, they exist in infected hosts as populations of genetically related variants. Recent advances in sequencing technologies make possible to identify such populations at great depth. In particular, these technologies provide new opportunities for inference of relatedness between viral samples, identification of transmission clusters and sources of infection, which are crucial tasks for viral outbreaks investigations. RESULTS: We present (i) an evolutionary simulation algorithm Viral Outbreak InferenCE (VOICE) inferring genetic relatedness, (ii) an algorithm MinDistB detecting possible transmission using minimal distances between intra-host viral populations and sizes of their relative borders, and (iii) a non-parametric recursive clustering algorithm Relatedness Depth (ReD) analyzing clusters' structure to infer possible transmissions and their directions. All proposed algorithms were validated using real sequencing data from HCV outbreaks. CONCLUSIONS: All algorithms are applicable to the analysis of outbreaks of highly heterogeneous RNA viruses. Our experimental validation shows that they can successfully identify genetic relatedness between viral populations, as well as infer transmission clusters and outbreak sources. |
GHOST: global hepatitis outbreak and surveillance technology.
Longmire AG , Sims S , Rytsareva I , Campo DS , Skums P , Dimitrova Z , Ramachandran S , Medrzycki M , Thai H , Ganova-Raeva L , Lin Y , Punkova LT , Sue A , Mirabito M , Wang S , Tracy R , Bolet V , Sukalac T , Lynberg C , Khudyakov Y . BMC Genomics 2017 18 916 BACKGROUND: Hepatitis C is a major public health problem in the United States and worldwide. Outbreaks of hepatitis C virus (HCV) infections associated with unsafe injection practices, drug diversion, and other exposures to blood are difficult to detect and investigate. Effective HCV outbreak investigation requires comprehensive surveillance and robust case investigation. We previously developed and validated a methodology for the rapid and cost-effective identification of HCV transmission clusters. Global Hepatitis Outbreak and Surveillance Technology (GHOST) is a cloud-based system enabling users, regardless of computational expertise, to analyze and visualize transmission clusters in an independent, accurate and reproducible way. RESULTS: We present and explore performance of several GHOST implemented algorithms using next-generation sequencing data experimentally obtained from hypervariable region 1 of genetically related and unrelated HCV strains. GHOST processes data from an entire MiSeq run in approximately 3 h. A panel of seven specimens was used for preparation of six repeats of MiSeq libraries. Testing sequence data from these libraries by GHOST showed a consistent transmission linkage detection, testifying to high reproducibility of the system. Lack of linkage among genetically unrelated HCV strains and constant detection of genetic linkage between HCV strains from known transmission pairs and from follow-up specimens at different levels of MiSeq-read sampling indicate high specificity and sensitivity of GHOST in accurate detection of HCV transmission. CONCLUSIONS: GHOST enables automatic extraction of timely and relevant public health information suitable for guiding effective intervention measures. It is designed as a virtual diagnostic system intended for use in molecular surveillance and outbreak investigations rather than in research. The system produces accurate and reproducible information on HCV transmission clusters for all users, irrespective of their level of bioinformatics expertise. Improvement in molecular detection capacity will contribute to increasing the rate of transmission detection, thus providing opportunity for rapid, accurate and effective response to outbreaks of hepatitis C. Although GHOST was originally developed for hepatitis C surveillance, its modular structure is readily applicable to other infectious diseases. Worldwide availability of GHOST for the detection of HCV transmissions will foster deeper involvement of public health researchers and practitioners in hepatitis C outbreak investigation. |
Increased Mitochondrial Genetic Diversity in Persons Infected With Hepatitis C Virus.
Campo DS , Roh HJ , Pearlman BL , Fierer DS , Ramachandran S , Vaughan G , Hinds A , Dimitrova Z , Skums P , Khudyakov Y . Cell Mol Gastroenterol Hepatol 2016 2 (5) 676-684 Background & Aims: The host genetic environment contributes significantly to the outcomes of hepatitis C virus (HCV) infection and therapy response, but little is known about any effects of HCV infection on the host beyond any changes related to adaptive immune responses. HCV persistence is associated strongly with mitochondrial dysfunction, with liver mitochondrial DNA (mtDNA) genetic diversity linked to disease progression. Methods: We evaluated the genetic diversity of 2 mtDNA genomic regions (hypervariable segments 1 and 2) obtained from sera of 116 persons using next-generation sequencing. Results: Results were as follows: (1) the average diversity among cases with seronegative acute HCV infection was 4.2 times higher than among uninfected controls; (2) the diversity level among cases with chronic HCV infection was 96.1 times higher than among uninfected controls; and (3) the diversity was 23.1 times higher among chronic than acute cases. In 2 patients who were followed up during combined interferon and ribavirin therapy, mtDNA nucleotide diversity decreased dramatically after the completion of therapy in both patients: by 100% in patient A after 54 days and by 70.51% in patient B after 76 days. Conclusions: HCV infection strongly affects mtDNA genetic diversity. A rapid decrease in mtDNA genetic diversity observed after therapy-induced HCV clearance suggests that the effect is reversible, emphasizing dynamic genetic relationships between HCV and mitochondria. The level of mtDNA nucleotide diversity can be used to discriminate recent from past infections, which should facilitate the detection of recent transmission events and thus help identify modes of transmission. |
Next-Generation Sequencing Reveals Frequent Opportunities for Exposure to Hepatitis C Virus in Ghana.
Forbi JC , Layden JE , Phillips RO , Mora N , Xia GL , Campo DS , Purdy MA , Dimitrova ZE , Owusu DO , Punkova LT , Skums P , Owusu-Ofori S , Sarfo FS , Vaughan G , Roh H , Opare-Sem OK , Cooper RS , Khudyakov YE . PLoS One 2015 10 (12) e0145530 Globally, hepatitis C Virus (HCV) infection is responsible for a large proportion of persons with liver disease, including cancer. The infection is highly prevalent in sub-Saharan Africa. West Africa was identified as a geographic origin of two HCV genotypes. However, little is known about the genetic composition of HCV populations in many countries of the region. Using conventional and next-generation sequencing (NGS), we identified and genetically characterized 65 HCV strains circulating among HCV-positive blood donors in Kumasi, Ghana. Phylogenetic analysis using consensus sequences derived from 3 genomic regions of the HCV genome, 5'-untranslated region, hypervariable region 1 (HVR1) and NS5B gene, consistently classified the HCV variants (n = 65) into genotypes 1 (HCV-1, 15%) and genotype 2 (HCV-2, 85%). The Ghanaian and West African HCV-2 NS5B sequences were found completely intermixed in the phylogenetic tree, indicating a substantial genetic heterogeneity of HCV-2 in Ghana. Analysis of HVR1 sequences from intra-host HCV variants obtained by NGS showed that three donors were infected with >1 HCV strain, including infections with 2 genotypes. Two other donors share an HCV strain, indicating HCV transmission between them. The HCV-2 strain sampled from one donor was replaced with another HCV-2 strain after only 2 months of observation, indicating rapid strain switching. Bayesian analysis estimated that the HCV-2 strains in Ghana were expanding since the 16th century. The blood donors in Kumasi, Ghana, are infected with a very heterogeneous HCV population of HCV-1 and HCV-2, with HCV-2 being prevalent. The detection of three cases of co- or super-infections and transmission linkage between 2 cases suggests frequent opportunities for HCV exposure among the blood donors and is consistent with the reported high HCV prevalence. The conditions for effective HCV-2 transmission existed for ~ 3-4 centuries, indicating a long epidemic history of HCV-2 in Ghana. |
Accurate genetic detection of hepatitis C virus transmissions in outbreak settings.
Campo DS , Xia GL , Dimitrova Z , Lin Y , Forbi JC , Ganova-Raeva L , Punkova L , Ramachandran S , Thai H , Skums P , Sims S , Rytsareva I , Vaughan G , Roh HJ , Purdy MA , Sue A , Khudyakov Y . J Infect Dis 2015 213 (6) 957-65 Hepatitis C is a major public health problem in the United States and worldwide. Outbreaks of hepatitis C virus (HCV) infections are associated with unsafe injection practices, drug diversion, and other exposures to blood, being difficult to detect and investigate. Here, we developed and validated a simple approach for molecular detection of HCV transmissions in outbreak settings. We obtained sequences from the HCV hypervariable region 1 (HVR1) using End-Point Limiting-Dilution (EPLD) from 127 cases involved in 32 epidemiologically defined HCV outbreaks and 193 individuals with unrelated HCV strains. We compared several types of genetic distances and calculated a threshold using minimal Hamming distances that identifies transmission clusters in all tested outbreaks with 100% accuracy. The approach was also validated on sequences from 239 individuals obtained using next-generation sequencing, showing the same accuracy as EPLD. In average, nucleotide diversity of the intra-host population was 6.2-times greater in the source than in any incident case, allowing the correct detection of transmission direction in 8 outbreaks for which source cases were known. A simple and accurate distance-based approach for detecting HCV transmissions developed here streamlines molecular investigation of outbreaks, thus improving the public health capacity for rapid and effective control of hepatitis C. |
Good laboratory practice for clinical next-generation sequencing informatics pipelines.
Gargis AS , Kalman L , Bick DP , da Silva C , Dimmock DP , Funke BH , Gowrisankar S , Hegde MR , Kulkarni S , Mason CE , Nagarajan R , Voelkerding KV , Worthey EA , Aziz N , Barnes J , Bennett SF , Bisht H , Church DM , Dimitrova Z , Gargis SR , Hafez N , Hambuch T , Hyland FC , Luna RA , MacCannell D , Mann T , McCluskey MR , McDaniel TK , Ganova-Raeva LM , Rehm HL , Reid J , Campo DS , Resnick RB , Ridge PG , Salit ML , Skums P , Wong LJ , Zehnbauer BA , Zook JM , Lubin IM . Nat Biotechnol 2015 33 (7) 689-93 We report principles and guidelines (Supplementary Note) that were developed by the Next-Generation Sequencing: Standardization of Clinical Testing II (Nex-StoCT II) informatics workgroup, which was first convened on October 11–12, 2012, in Atlanta, Georgia, by the US Centers for Disease Control and Prevention (CDC; Atlanta, GA). We present here recommendations for the design, optimization and implementation of an informatics pipeline for clinical next-generation sequencing (NGS) to detect germline sequence variants in compliance with existing regulatory and professional quality standards1. The workgroup, which included informatics experts, clinical and research laboratory professionals, physicians with experience in interpreting NGS results, NGS test platform and software developers and participants from US government agencies and professional organizations, also discussed the use of NGS in testing for cancer and infectious disease. A typical NGS analytical process and selected workgroup recommendations are summarized in Table 1, and detailed in the guidelines presented in the Supplementary Note. |
Cryptic Hepatitis B and E in Patients With Acute Hepatitis of Unknown Etiology.
Ganova-Raeva L , Punkova L , Campo DS , Dimitrova Z , Skums P , Vu NH , Dat DT , Dalton HR , Khudyakov Y . J Infect Dis 2015 212 (12) 1962-9 BACKGROUND: Up to 30% of acute viral hepatitis has no known etiology. To determine the disease etiology in patients with acute hepatitis of unknown etiology (HUE), serum specimens were obtained from 38 patients residing in the United Kingdom and Vietnam and from 26 healthy US blood donors. All specimens tested negative for known viral infections causing hepatitis, using commercially available serological and nucleic acid assays. METHODS: Specimens were processed by sequence-independent complementary DNA amplification and next-generation sequencing (NGS). Sufficient material for individual NGS libraries was obtained from 12 HUE cases and 26 blood donors; the remaining HUE cases were sequenced as a pool. Read mapping was done by targeted and de novo assembly. RESULTS: Sequences from hepatitis B virus (HBV) were detected in 7 individuals with HUE (58.3%) and the pooled library, and hepatitis E virus (HEV) was detected in 2 individuals with HUE (16.7%) and the pooled library. Both HEV-positive cases were coinfected with HBV. HBV sequences belonged to genotypes A, D, or G, and HEV sequences belonged to genotype 3. No known hepatotropic viruses were detected in the tested normal human sera. CONCLUSIONS: NGS-based detection of HBV and HEV infections is more sensitive than using commercially available assays. HBV and HEV may be cryptically associated with HUE. |
Antigenic cooperation among intrahost HCV variants organized into a complex network of cross-immunoreactivity.
Skums P , Bunimovich L , Khudyakov Y . Proc Natl Acad Sci U S A 2015 112 (21) 6653-8 Hepatitis C virus (HCV) has the propensity to cause chronic infection. Continuous immune escape has been proposed as a mechanism of intrahost viral evolution contributing to HCV persistence. Although the pronounced genetic diversity of intrahost HCV populations supports this hypothesis, recent observations of long-term persistence of individual HCV variants, negative selection increase, and complex dynamics of viral subpopulations during infection as well as broad cross-immunoreactivity (CR) among variants are inconsistent with the immune-escape hypothesis. Here, we present a mathematical model of intrahost viral population dynamics under the condition of a complex CR network (CRN) of viral variants and examine the contribution of CR to establishing persistent HCV infection. The model suggests a mechanism of viral adaptation by antigenic cooperation (AC), with immune responses against one variant protecting other variants. AC reduces the capacity of the host's immune system to neutralize certain viral variants. CRN structure determines specific roles for each viral variant in host adaptation, with variants eliciting broad-CR antibodies facilitating persistence of other variants immunoreacting with these antibodies. The proposed mechanism is supported by empirical observations of intrahost HCV evolution. Interference with AC is a potential strategy for interruption and prevention of chronic HCV infection. |
Nosocomial hepatitis C virus transmission from tampering with injectable anesthetic opioids
Hatia RI , Dimitrova Z , Skums P , Teo EY , Teo CG . Hepatology 2015 62 (1) 101-10 The extent of provider-to-patient hepatitis C virus (HCV) transmission from diversion, self-injection and substitution ("tampering") of anesthetic opioids is unknown. To quantify the contribution of opioid tampering to nosocomial hepatitis C outbreaks, data from healthcare-related hepatitis C outbreaks occurring in developed countries from 1990-2012 were collated, grouped and compared. Tampering was associated with 17% (8/46) of outbreaks but 53% (438/833) of cases. Of the tampering outbreaks, 6 (75%) involved fentanyl, 5 (63%) occurred in the United States, and one each in Australia, Israel and Spain. Case counts ranged from 5-275 in the tampering outbreaks (mean, 54.8; median, 25), and 1-99 in the non-tampering outbreaks (mean, 10.4; median, 5); between them, the difference in mean ranks of counts was significant (p<0.01). To estimate HCV transmission risks from tampering, risk-assessment models were constructed, and these risks compared with those from surgery. The HCV transmission risk from exposure to an opioid preparation tampered by a provider of unknown HCV-infection status who is a person who injects drugs (PWID) (0.62%; standard error [e]=0.38%) exceeds 16,757 times the risk from surgery by a surgeon of unknown HCV-infection status (0.000037%; e=0.000029%), and 135 times by an HCV-infected surgeon (0.0046%; e=0.0033%). To pose a 50% patient transmission risk, an infected surgeon may take 30 years, compared to <1 year for a PWID-tamperer, and weeks or days for a PWID-tamperer who intensifies access to opioids. CONCLUSION: Disproportionately many cases of HCV infection from nosocomial outbreaks were attributable to provider tampering of anesthetic opioids. The transmission risk from tampering is substantially higher than from surgery. |
Computational framework for next-generation sequencing of heterogeneous viral populations using combinatorial pooling.
Skums P , Artyomenko A , Glebova O , Ramachandran S , Mandoiu I , Campo DS , Dimitrova Z , Zelikovsky A , Khudyakov Y . Bioinformatics 2014 31 (5) 682-90 MOTIVATION: Next-generation sequencing (NGS) allows for analyzing a large number of viral sequences from infected patients, providing an opportunity to implement large-scale molecular surveillance of viral diseases. However, despite improvements in technology, traditional protocols for NGS of large numbers of samples are still highly cost- and labor-intensive. One of the possible cost-effective alternatives is combinatorial pooling. Although a number of pooling strategies for consensus sequencing of DNA samples and detection of SNPs have been proposed, these strategies cannot be applied to sequencing of highly heterogeneous viral populations. RESULTS: We developed a cost-effective and reliable protocol for sequencing of viral samples, that combines NGS using barcoding and combinatorial pooling and a computational framework including algorithms for optimal virus-specific pools design and deconvolution of individual samples from sequenced pools. Evaluation of the framework on experimental and simulated data for hepatitis C virus showed that it substantially reduces the sequencing costs and allows deconvolution of viral populations with a high accuracy. AVAILABILITY: The source code and experimental data sets are available at http://alan.cs.gsu.edu/NGS/?q=content/pooling |
Next-generation sequencing reveals large connected networks of intra-host HCV variants.
Campo DS , Dimitrova Z , Yamasaki L , Skums P , Lau DT , Vaughan G , Forbi JC , Teo CG , Khudyakov Y . BMC Genomics 2014 15 Suppl 5 S4 BACKGROUND: Next-generation sequencing (NGS) allows for sampling numerous viral variants from infected patients. This provides a novel opportunity to represent and study the mutational landscape of Hepatitis C Virus (HCV) within a single host. RESULTS: Intra-host variants of the HCV E1/E2 region were extensively sampled from 58 chronically infected patients. After NGS error correction, the average number of reads and variants obtained from each sample were 3202 and 464, respectively. The distance between each pair of variants was calculated and networks were created for each patient, where each node is a variant and two nodes are connected by a link if the nucleotide distance between them is 1. The work focused on large components having > 5% of all reads, which in average account for 93.7% of all reads found in a patient. CONCLUSIONS: Most intra-host variants are organized into distinct single-mutation components that are: well separated from each other, represent genetic distances between viral variants, robust to sampling, reproducible and likely seeded during transmission events. Facilitated by NGS, large components offer a novel evolutionary framework for genetic analysis of intra-host viral populations and understanding transmission, immune escape and drug resistance. |
Intra-host diversity and evolution of hepatitis C virus endemic to Côte d'Ivoire.
Forbi JC , Campo DS , Purdy MA , Dimitrova ZE , Skums P , Xia GL , Punkova LT , Ganova-Raeva LM , Vaughan G , Ben-Ayed Y , Switzer WM , Khudyakov YE . J Med Virol 2014 86 (5) 765-71 Hepatitis C virus (HCV) infection presents an important, but underappreciated public health problem in Africa. In Cote d'Ivoire, very little is known about the molecular dynamics of HCV infection. Plasma samples (n = 608) from pregnant women collected in 1995 from Cote d'Ivoire were analyzed in this study. Only 18 specimens ( approximately 3%) were found to be HCV PCR-positive. Phylogenetic analysis of the HCV NS5b sequences showed that the HCV variants belong to genotype 1 (HCV1) (n = 12, 67%) and genotype 2 (HCV2) (n = 6, 33%), with a maximum genetic diversity among HCV variants in each genotype being 20.7% and 24.0%, respectively. Although all HCV2 variants were genetically distant from each other, six HCV1 variants formed two tight sub-clusters belonging to HCV1a and HCV1b. Analysis of molecular variance (AMOVA) showed that the genetic structure of HCV isolates from West Africa with Cote d'Ivoire included were significantly different from Central African strains (P = 0.0001). Examination of intra-host viral populations using next-generation sequencing of the HCV HVR1 showed a significant variation in intra-host genetic diversity among infected individuals, with some strains composed of sub-populations as distant from each other as viral populations from different hosts. Collectively, the results indicate a complex HCV evolution in Cote d'Ivoire, similar to the rest of West Africa, and suggest a unique HCV epidemic history in the country. |
Drug-resistance of a viral population and its individual intra-host variants during the first 48 hours of therapy
Campo DS , Skums P , Dimitrova Z , Vaughan G , Forbi JC , Teo CG , Khudyakov Y , Lau DT . Clin Pharmacol Ther 2014 95 (6) 627-35 Using HCV and IFN-resistance as a proof of concept, we have devised a new methodology for calculating the effect of a drug over a viral population and the resistance of its individual intra-host variants. By means of next-generation sequencing, HCV variants were obtained from sera collected at 9 time-points from 16 patients during the first 48 hours after injection of IFN-a. IFN-resistance coefficients were calculated for individual variants using changes in their relative frequencies, and for the entire intra-host viral population using changes in viral titer during the initial 48 hours. Population-wide resistance and presence of IFN-resistant variants were highly associated with pegIFN-a2a/RBV treatment outcome at week 12 (p = 3.78x10-5 and 0.0114, respectively). This new method allows an accurate measurement of resistance based solely on changes in viral titer or the relative frequency of intra-host viral variants during a short observation time. |
Reconstruction of viral population structure from next-generation sequencing data using multicommodity flows.
Skums P , Mancuso N , Artyomenko A , Tork B , Mandoiu I , Khudyakov Y , Zelikovsky A . BMC Bioinformatics 2013 14 Suppl 9 S2 BACKGROUND: Highly mutable RNA viruses exist in infected hosts as heterogeneous populations of genetically close variants known as quasispecies. Next-generation sequencing (NGS) allows for analysing a large number of viral sequences from infected patients, presenting a novel opportunity for studying the structure of a viral population and understanding virus evolution, drug resistance and immune escape. Accurate reconstruction of genetic composition of intra-host viral populations involves assembling the NGS short reads into whole-genome sequences and estimating frequencies of individual viral variants. Although a few approaches were developed for this task, accurate reconstruction of quasispecies populations remains greatly unresolved. RESULTS: Two new methods, AmpMCF and ShotMCF, for reconstruction of the whole-genome intra-host viral variants and estimation of their frequencies were developed, based on Multicommodity Flows (MCFs). AmpMCF was designed for NGS reads obtained from individual PCR amplicons and ShotMCF for NGS shotgun reads. While AmpMCF, based on covering formulation, identifies a minimal set of quasispecies explaining all observed reads, ShotMCS, based on packing formulation, engages the maximal number of reads to generate the most probable set of quasispecies. Both methods were evaluated on simulated data in comparison to Maximum Bandwidth and ViSpA, previously developed state-of-the-art algorithms for estimating quasispecies spectra from the NGS amplicon and shotgun reads, respectively. Both algorithms were accurate in estimation of quasispecies frequencies, especially from large datasets. CONCLUSIONS: The problem of viral population reconstruction from amplicon or shotgun NGS reads was solved using the MCF formulation. The two methods, ShotMCF and AmpMCF, developed here afford accurate reconstruction of the structure of intra-host viral population from NGS reads. The implementations of the algorithms are available at http://alan.cs.gsu.edu/vira.html (AmpMCF) and http://alan.cs.gsu.edu/NGS/?q=content/shotmcf (ShotMCF). |
Efficient error correction for next-generation sequencing of viral amplicons.
Skums P , Dimitrova Z , Campo DS , Vaughan G , Rossi L , Forbi JC , Yokosawa J , Zelikovsky A , Khudyakov Y . BMC Bioinformatics 2012 13 Suppl 10 S6 BACKGROUND: Next-generation sequencing allows the analysis of an unprecedented number of viral sequence variants from infected patients, presenting a novel opportunity for understanding virus evolution, drug resistance and immune escape. However, sequencing in bulk is error prone. Thus, the generated data require error identification and correction. Most error-correction methods to date are not optimized for amplicon analysis and assume that the error rate is randomly distributed. Recent quality assessment of amplicon sequences obtained using 454-sequencing showed that the error rate is strongly linked to the presence and size of homopolymers, position in the sequence and length of the amplicon. All these parameters are strongly sequence specific and should be incorporated into the calibration of error-correction algorithms designed for amplicon sequencing. RESULTS: In this paper, we present two new efficient error correction algorithms optimized for viral amplicons: (i) k-mer-based error correction (KEC) and (ii) empirical frequency threshold (ET). Both were compared to a previously published clustering algorithm (SHORAH), in order to evaluate their relative performance on 24 experimental datasets obtained by 454-sequencing of amplicons with known sequences. All three algorithms show similar accuracy in finding true haplotypes. However, KEC and ET were significantly more efficient than SHORAH in removing false haplotypes and estimating the frequency of true ones. CONCLUSIONS: Both algorithms, KEC and ET, are highly suitable for rapid recovery of error-free haplotypes obtained by 454-sequencing of amplicons from heterogeneous viruses.The implementations of the algorithms and data sets used for their testing are available at: http://alan.cs.gsu.edu/NGS/?q=content/pyrosequencing-error-correction-algorithm. |
- Page last reviewed:Feb 1, 2024
- Page last updated:Apr 22, 2024
- Content source:
- Powered by CDC PHGKB Infrastructure